Search CORE

57 research outputs found

Fluent Translations from Disfluent Speech in End-to-End Speech Translation

Author: Salesky Elizabeth
Sperber Matthias
Waibel Alex
Publication venue
Publication date: 01/01/2019
Field of study

Spoken language translation applications for speech suffer due to conversational speech phenomena, particularly the presence of disfluencies. With the rise of end-to-end speech translation models, processing steps such as disfluency removal that were previously an intermediate step between speech recognition and machine translation need to be incorporated into model architectures. We use a sequence-to-sequence model to translate from noisy, disfluent speech to fluent text with disfluencies removed using the recently collected `copy-edited' references for the Fisher Spanish-English dataset. We are able to directly generate fluent translations and introduce considerations about how to evaluate success on this task. This work provides a baseline for a new task, the translation of conversational speech with joint removal of disfluencies.Comment: Accepted at NAACL 201

arXiv.org e-Print Archive

Crossref

Multilingual Pixel Representations for Translation and Effective Cross-lingual Transfer

Author: Koehn Philipp
Post Matt
Salesky Elizabeth
Verma Neha
Publication venue
Publication date: 24/10/2023
Field of study

We introduce and demonstrate how to effectively train multilingual machine translation models with pixel representations. We experiment with two different data settings with a variety of language and script coverage, demonstrating improved performance compared to subword embeddings. We explore various properties of pixel representations such as parameter sharing within and across scripts to better understand where they lead to positive transfer. We observe that these properties not only enable seamless cross-lingual transfer to unseen scripts, but make pixel representations more data-efficient than alternatives such as vocabulary expansion. We hope this work contributes to more extensible multilingual models for all languages and scripts.Comment: EMNLP 202

arXiv.org e-Print Archive

Tutorial: End-to-End Speech Translation

Author: Negri Matteo
Niehues Jan
Salesky Elizabeth
Turchi Marco
Publication venue: Association for Computational Linguistics
Publication date: 01/01/2021
Field of study

Speech translation is the translation of speech in one language typically to text in another, traditionally accomplished through a combination of automatic speech recognition and machine translation. Speech translation has attracted interest for many years, but the recent successful applications of deep learning to both individual tasks have enabled new opportunities through joint modeling, in what we today call 'end-to-end speech translation.' In this tutorial we will introduce the techniques used in cutting-edge research on speech translation. Starting from the traditional cascaded approach, we will given an overview on data sources and model architectures to achieve state-of-the art performance with end-to-end speech translation for both high- and low-resource languages. In addition, we will discuss methods to evaluate analyze the proposed solutions, as well as the challenges faced when applying speech translation models for real-world applications

Maastricht University Research Portal

KITopen

Relative Positional Encoding for Speech Recognition and Direct Translation

Author: Ha Thanh-Le
Nguyen Thai-Son
Nguyen Tuan-Nam
Niehues Jan
Pham Ngoc-Quan
Salesky Elizabeth
Stueker Sebastian
Waibel Alexander
Publication venue
Publication date: 01/01/2020
Field of study

Transformer models are powerful sequence-to-sequence architectures that are capable of directly mapping speech inputs to transcriptions or translations. However, the mechanism for modeling positions in this model was tailored for text modeling, and thus is less ideal for acoustic inputs. In this work, we adapt the relative position encoding scheme to the Speech Transformer, where the key addition is relative distance between input states in the self-attention network. As a result, the network can better adapt to the variable distributions present in speech data. Our experiments show that our resulting model achieves the best recognition result on the Switchboard benchmark in the non-augmentation condition, and the best published result in the MuST-C speech translation benchmark. We also show that this model is able to better utilize synthetic data than the Transformer, and adapts better to variable sentence segmentation quality for speech translation.Comment: Submitted to Interspeech 202

arXiv.org e-Print Archive

Maastricht University Research Portal

Crossref

Evaluating Multilingual Speech Translation under Realistic Conditions with Resegmentation and Terminology

Author: Al-Badrashiny Mohamed
Darwish Kareem
Diab Mona
Niehues Jan
Salesky Elizabeth
Publication venue: Association for Computational Linguistics
Publication date: 02/08/2023
Field of study

KITopen

On the Copying Problem of Unsupervised NMT: A Training Schedule with a Language Discriminator Loss

Author: Carpuat Marine
Chronopoulou Alexandra
Federico Marcello
Fraser Alexander
Liu Yihong
Salesky Elizabeth
Schütze Hinrich
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/07/2023
Field of study

Although unsupervised neural machine translation (UNMT) has achieved success in many language pairs, the copying problem, i.e., directly copying some parts of the input sentence as the translation, is common among distant language pairs, especially when low-resource languages are involved. We find this issue is closely related to an unexpected copying behavior during online back-translation (BT). In this work, we propose a simple but effective training schedule that incorporates a language discriminator loss. The loss imposes constraints on the intermediate translation so that the translation is in the desired language. By conducting extensive experiments on different language pairs, including similar and distant, high and low-resource languages, we find that our method alleviates the copying problem, thus improving the translation performance on low-resource languages

Open Access LMU